03. Experience Replay

Experience Replay

Experience Replay

## Summary

When the agent interacts with the environment, the sequence of experience tuples can be highly correlated. The naive Q-learning algorithm that learns from each of these experience tuples in sequential order runs the risk of getting swayed by the effects of this correlation. By instead keeping track of a replay buffer and using experience replay to sample from the buffer at random, we can prevent action values from oscillating or diverging catastrophically.

The replay buffer contains a collection of experience tuples ( S , A , R , S' ). The tuples are gradually added to the buffer as we are interacting with the environment.

The act of sampling a small batch of tuples from the replay buffer in order to learn is known as experience replay . In addition to breaking harmful correlations, experience replay allows us to learn more from individual tuples multiple times, recall rare occurrences, and in general make better use of our experience.

## Quiz

Which of the following are true? Select all that apply.

SOLUTION:
  • Experience replay is based on the idea that we can learn better, if we do multiple passes over the same experience.
  • Experience replay is used to generate uncorrelated experience data for online training of deep RL agents.